Calibration Data Quality Assessment

IRT Item Response Validation Report

Published

November 10, 2025

Executive Summary

This report provides a comprehensive quality assessment of the IRT calibration dataset, which combines data from 6 studies (NE20, NE22, NE25, NSCH21, NSCH22, USA24) for psychometric modeling.

Data Quality Flags Detected

The validation system detected 3 types of data quality issues:

  1. Category Mismatch: Observed response categories differ from codebook expectations

    • Invalid values: Response values not defined in codebook (e.g., value=9 when only {0,1,2} expected)
    • Fewer categories: Missing response categories (ceiling/floor effects)
  2. Negative Age-Response Correlation: Items where older children score lower than younger children (developmentally unexpected). Only checked for Kidsights Measurement Tool items; GSED-PF psychosocial items are excluded as they are not developmental.

  3. Non-Sequential Response Values: Response values with gaps (e.g., {0,1,9} instead of {0,1,2}), suggesting undocumented missing codes


Overall Dataset Summary


Flag Summary Statistics


Flag Distribution by Type


Detailed Flag Report

This section provides a comprehensive, filterable table of all detected quality flags.

Interpretation Guide
  • ERROR flags (red): Require immediate attention - invalid data values detected
  • WARNING flags (yellow): Noteworthy patterns that may affect modeling
  • Instruments (color-coded):
    • Kidsights Measurement Tool - Developmental items (checked for age correlation)
    • GSED-PF - Psychosocial items (excluded from age checks)
  • Use column filters to focus on specific studies, flag types, or severities
  • Click “CSV” or “Excel” buttons to export data for further analysis

Item Explorer: Top Flagged Items

This section shows age-response relationships for the most frequently flagged items.

Interactive Version

For a fully interactive item explorer with dropdown selection, run this report as a Shiny app:

library(quarto)
quarto::quarto_preview("docs/irt_scoring/calibration_quality_report.qmd")

This static version shows the top 10 flagged items for quick review.

#| label: plot-all-top-items #| results: asis

for (item in top_items) { cat(sprintf(“## %s”, item))

# Get flags for this item item_flags <- quality_flags %>% filter(item_id == item) %>% select(Study = study, Flag Type = flag_type, Severity = flag_severity, Description = description)

cat(“Flags for this item:”) print(knitr::kable(item_flags, format = “markdown”)) cat(“”)

# Plot age-response relationship p <- plot_item_age_response(item, calibration_data) if (!is.null(p)) { print(p) } else { cat(“No data available for plotting”) }

cat(“—”) }


About This Report
  • Data Source: calibration_dataset_2020_2025 table (harmonized lex_equate names)
  • Validation Function: scripts/irt_scoring/validate_calibration_quality.R
  • Generated: 2025-11-10 10:46:23.586172
  • Studies: NE20 (n=37,546), NE22 (n=2,431), NE25 (n=3,507), NSCH21 (n=1,000 sampled), NSCH22 (n=1,000 sampled), USA24 (n=1,600)

For questions or to report data quality concerns, contact the Calibration Pipeline Team.